cudaBayesreg: Bayesian Computation in CUDA

نویسنده

  • Adelino Ferreira da Silva
چکیده

Graphical processing units are rapidly gaining maturity as powerful general parallel computing devices. The package cudaBayesreg uses GPU–oriented procedures to improve the performance of Bayesian computations. The paper motivates the need for devising highperformance computing strategies in the context of fMRI data analysis. Some features of the package for Bayesian analysis of brain fMRI data are illustrated. Comparative computing performance figures between sequential and parallel implementations are presented as well. A functional magnetic resonance imaging (fMRI) data set consists of time series of volume data in 4D space. Typically, volumes are collected as slices of 64 x 64 voxels. The most commonly used functional imaging technique relies on the blood oxygenation level dependent (BOLD) phenomenon (Sardy, 2007). By analyzing the information provided by the BOLD signals in 4D space, it is possible to make inferences about activation patterns in the human brain. The statistical analysis of fMRI experiments usually involve the formation and assessment of a statistic image, commonly referred to as a Statistical Parametric Map (SPM). The SPM summarizes a statistic indicating evidence of the underlying neuronal activations for a particular task. The most common approach to SPM computation involves a univariate analysis of the time series associated with each voxel. Univariate analysis techniques can be described within the framework of the general linear model (GLM) (Sardy, 2007). The GLM procedure used in fMRI data analysis is often said to be “massively univariate”, since data for each voxel are independently fitted with the same model. Bayesian methodologies provide enhanced estimation accuracy (Friston et al., 2002). However, since (nonvariational) Bayesian models draw on Markov Chain Monte Carlo (MCMC) simulations, Bayesian estimates involve a heavy computational burden. The programmable Graphic Processor Unit (GPU) has evolved into a highly parallel processor with tremendous computational power and very high memory bandwidth (NVIDIA Corporation, 2010b). Modern GPUs are built around a scalable array of multithreaded streaming multiprocessors (SMs). Current GPU implementations enable scheduling thousands of concurrently executing threads. The Compute Unified Device Architecture (CUDA) (NVIDIA Corporation, 2010b) is a software platform for massively parallel high-performance computing on NVIDIA manycore GPUs. The CUDA programming model follows the standard singleprogram multiple-data (SPMD) model. CUDA greatly simplifies the task of parallel programming by providing thread management tools that work as extensions of conventional C/C++ constructions. Automatic thread management removes the burden of handling the scheduling of thousands of lightweight threads, and enables straightforward programming of the GPU cores. The package cudaBayesreg (Ferreira da Silva, 2010a) implements a Bayesian multilevel model for the analysis of brain fMRI data in the CUDA environment. The statistical framework in cudaBayesreg is built around a Gibbs sampler for multilevel/hierarchical linear models with a normal prior (Ferreira da Silva, 2010c). Multilevel modeling may be regarded as a generalization of regression methods in which regression coefficients are themselves given a model with parameters estimated from data (Gelman, 2006). As in SPM, the Bayesian model fits a linear regression model at each voxel, but uses uses multivariate statistics for parameter estimation at each iteration of the MCMC simulation. The Bayesian model used in cudaBayesreg follows a two–stage Bayes prior approach to relate voxel regression equations through correlations between the regression coefficient vectors (Ferreira da Silva, 2010c). This model closely follows the Bayesian multilevel model proposed by Rossi, Allenby and McCulloch (Rossi et al., 2005), and implemented in bayesm (Rossi and McCulloch., 2008). This approach overcomes several limitations of the classical SPM methodology. The SPM methodology traditionally used in fMRI has several important limitations, mainly because it relies on classical hypothesis tests and p–values to make statistical inferences in neuroimaging (Friston et al., 2002; Berger and Sellke, 1987; Vul et al., 2009). However, as is often the case with MCMC simulations, the implementation of this Bayesian model in a sequential computer entails significant time complexity. The CUDA implementation of the Bayesian model proposed here has been able to reduce significantly the runtime processing of the MCMC simulations. The main contribution for the increased performance comes from the use of separate threads for fitting the linear regression model at each voxel in parallel. Bayesian multilevel modeling We are interested in the following Bayesian multilevel model, which has been analyzed by Rossi The R Journal Vol. 2/2, December 2010 ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLES 49 et al. (2005), and has been implemented as rhierLinearModel in bayesm. Start out with a general linear model, and fit a set of m voxels as, yi = Xiβi + ei, ei iid ∼ N ( 0,σ2 i Ini ) , i = 1, . . . ,m. (1) In order to tie together the voxels’ regression equations, assume that the {βi} have a common prior distribution. To build the Bayesian regression model we need to specify a prior on the {βi} coefficients, and a prior on the regression error variances {σ2 i }. Following Ferreira da Silva (2010c), specify a normal regression prior with mean ∆zi for each β, βi = ∆zi + νi, νi iid ∼ N ( 0,Vβ ) , (2) where z is a vector of nz elements, representing characteristics of each of the m regression equations. The prior (2) can be written using the matrix form of the multivariate regression model for k regression coefficients, B = Z∆ + V (3) where B and V are m × k matrices, Z is a m × nz matrix, ∆ is a nz × k matrix. Interestingly, the prior (3) assumes the form of a second–stage regression, where each column of ∆ has coefficients which describes how the mean of the k regression coefficients varies as a function of the variables in z. In (3), Z assumes the role of a prior design matrix. The proposed Bayesian model can be written down as a sequence of conditional distributions (Ferreira da Silva, 2010c), yi | Xi, βi,σ i βi | zi,∆,Vβ σ2 i | νi, si Vβ | ν,V ∆ | Vβ, ∆̄, A. (4) Running MCMC simulations on the set of full conditional posterior distributions (4), the full posterior for all the parameters of interest may then be derived.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation with Nvidia CUDA Compatible Devices

In this paper, we propose an acceleration of collapsed variational Bayesian (CVB) inference for latent Dirichlet allocation (LDA) by using Nvidia CUDA compatible devices. While LDA is an efficient Bayesian multi-topic document model, it requires complicated computations for parameter estimation in comparison with other simpler document models, e.g. probabilistic latent semantic indexing, etc. T...

متن کامل

MPI- and CUDA- implementations of modal finite difference method for P-SV wave propagation modeling

Among different discretization approaches, Finite Difference Method (FDM) is widely used for acoustic and elastic full-wave form modeling. An inevitable deficit of the technique, however, is its sever requirement to computational resources. A promising solution is parallelization, where the problem is broken into several segments, and the calculations are distributed over different processors. ...

متن کامل

Montblanc: GPU accelerated Radio Interferometer Measurement Equations in support of Bayesian Inference for Radio Observations

We present Montblanc, a GPU implementation of the Radio interferometer measurement equation (RIME) in support of the Bayesian inference for radio observations (BIRO) technique. BIRO uses Bayesian inference to select sky models that best match the visibilities observed by a radio interferometer. To accomplish this, BIRO evaluates the RIME multiple times, varying sky model parameters to produce m...

متن کامل

Cuda Accelerated Ltl Model Checking Cuda Accelerated Ltl Model Checking *

Recent technological developments made available various many-core hardware platforms. For example, a SIMD-like hardware architecture became easily accessible for many users who have their computers equipped with modern NVIDIA GPU cards with CUDA technology. In this paper we redesign the maximal accepting predecessors algorithm [7] for LTL model checking in terms of matrix-vector product in ord...

متن کامل

Computing Optimal Cycle Mean in Parallel on CUDA

Computation of optimal cycle mean in a directed weighted graph has many applications in program analysis, performance verification in particular. In this paper we propose a data-parallel algorithmic solution to the problem and show how the computation of optimal cycle mean can be efficiently accelerated by means of CUDA technology. We show how the problem of computation of optimal cycle mean is...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010